Search CORE

37 research outputs found

Real time clustering of time series using triangular potentials

Author: Pacchiano Aldo
Williams Oliver
Publication venue
Publication date: 17/02/2015
Field of study

Motivated by the problem of computing investment portfolio weightings we investigate various methods of clustering as alternatives to traditional mean-variance approaches. Such methods can have significant benefits from a practical point of view since they remove the need to invert a sample covariance matrix, which can suffer from estimation error and will almost certainly be non-stationary. The general idea is to find groups of assets which share similar return characteristics over time and treat each group as a single composite asset. We then apply inverse volatility weightings to these new composite assets. In the course of our investigation we devise a method of clustering based on triangular potentials and we present associated theoretical results as well as various examples based on synthetic data.Comment: AIFU1

arXiv.org e-Print Archive

CiteSeerX

Crossref

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit

Author: Bartlett Peter
Jordan Michael I.
Pacchiano Aldo
Publication venue
Publication date: 08/11/2021
Field of study

We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent logarithmic regret guarantees.Comment: 44 page

arXiv.org e-Print Archive

Robustness Guarantees for Mode Estimation with an Application to Bandits

Author: Jiang Heinrich
Jordan Michael I.
Pacchiano Aldo
Publication venue
Publication date: 05/03/2020
Field of study

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this paper, we give precise robustness guarantees as well as privacy guarantees under simple randomization. We then introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We prove regret guarantees for the problems of top arm identification, top m-arms identification, contextual modal bandits, and infinite continuous arms top arm recovery. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences, thus rendering modal bandits an attractive choice in situations where the rewards may have outliers or adversarial corruptions.Comment: 12 pages, 7 figures, 14 appendix page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Anytime Model Selection in Linear Bandits

Author: Emmenegger Nicolas
Kassraie Parnian
Krause Andreas
Pacchiano Aldo
Publication venue
Publication date: 24/07/2023
Field of study

Model selection in the context of bandit optimization is a challenging problem, as it requires balancing exploration and exploitation not only for action selection, but also for model selection. One natural approach is to rely on online learning algorithms that treat different models as experts. Existing methods, however, scale poorly (

\text{poly}M

) with the number of models

M

in terms of their regret. Our key insight is that, for model selection in linear bandits, we can emulate full-information feedback to the online learner with a favorable bias-variance trade-off. This allows us to develop ALEXP, which has an exponentially improved (

\log M

) dependence on

M

for its regret. ALEXP has anytime guarantees on its regret, and neither requires knowledge of the horizon

n

, nor relies on an initial purely exploratory stage. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.Comment: 37 pages, 7 figure

arXiv.org e-Print Archive